Main features
Visualization engine: The
gridsystem (same used byggplot2.)Layout algorithms: Default uses
igraph’s layout.Vertex sizes: Relative to the drawing area.
Network visualization (in R) with “netplot” and motif counting (in C++) with “barry”
SCI Seminar
Division of Epidemiology
University of Utah
2023-04-07
Research Assistant Professor of Epidemiology.
Ph.D. in Biostatistics from USC and M.Sc. in Economics from Caltech.
Methodologist working at the intersection between Statistical Computing and Complex Systems Modeling.
You can download the slides from
ggv.cl/slides/sci2023
What: An R package for network visualization inspired by Gephi.
Why: Opinionated way to visualize graphs.1
Current parameters in nplot (main function)
nplot(
edgelist,
layout,
vertex.size = 1,
bg.col = "transparent",
vertex.nsides = 10,
vertex.color = grDevices::hcl.colors(1),
vertex.size.range = c(0.01, 0.03),
vertex.frame.color = ...,
vertex.rot = 0,
vertex.frame.prop = 0.2,
vertex.label = NULL,
vertex.label.fontsize = NULL,
vertex.label.color = "black",
vertex.label.fontfamily = "HersheySans",
vertex.label.fontface = "bold",
vertex.label.show = 0.3,
vertex.label.range = c(5, 15),
edge.width = 1,
edge.width.range = c(1, 2),
edge.arrow.size = NULL,
edge.color = ~ego(alpha = 0.25, col = "gray") + alter,
edge.curvature = pi/3,
edge.line.lty = "solid",
edge.line.breaks = 5,
sample.edges = 1,
skip.vertex = FALSE,
skip.edges = FALSE,
skip.arrows = skip.edges,
add = FALSE,
zero.margins = TRUE,
...
)Visualization engine: The grid system (same used by ggplot2.)
Layout algorithms: Default uses igraph’s layout.
Vertex sizes: Relative to the drawing area.
The personal friendship network of a faculty of a UK university, consisting of 81 vertices (individuals) and 817 directed and weighted connections. The school affiliation of each individual is stored as a vertex attribute. This dataset can serve as a testbed for community detection algorithms.
Things to notice:
Vertex size autoscaled to the device size.
Edged colored mixing ego and alter (source+target.)
Edges change colors continuously (gradient.)
Vertices and edges’ sizes scale as required by the user.
Graphical objects (Grobs)
List of 11
$ .xlim : num [1:2] -1 1
$ .ylim : num [1:2] -0.5 0.5
$ .layout : num [1:81, 1:2] 0.6661 0.0201 0.7327 0.5399 -0.4903 ...
$ .edgelist : num [1:817, 1:2] 57 76 12 43 28 58 7 40 5 48 ...
$ .N : int 81
$ .M : int 817
$ name : chr "graph.3"
$ gp : NULL
$ vp : NULL
$ children :List of 2
..$ background:List of 10
.. ..- attr(*, "class")= chr [1:3] "rect" "grob" "gDesc"
..$ graph :List of 5
.. ..- attr(*, "class")= chr [1:3] "gTree" "grob" "gDesc"
..- attr(*, "class")= chr "gList"
$ childrenOrder: chr [1:2] "background" "graph"
- attr(*, "class")= chr [1:4] "netplot" "gTree" "grob" "gDesc"
netplot supports advanced patterns. The figures feature radial gradients (vertices), lineal gradients, and repeated patterns (background).
In the case of ggplot2 (and thus, ggraph)
While ggplot2 uses grid underneath it’s grammar API, these features are generally not directly available in ggplot2.
– Thomas Lin Pedersen, author ofggraph(source: tidyverse.org)
gggrid package does:The ‘ggplot2’ package does not yet have an interface for pattern fills, but the ‘gggrid’ package (Murrell, 2022) allows us to combine raw ‘grid’ output with the ‘ggplot2’ plot.
– Paul Murrel, author ofgrid(source: Vectorised Pattern Fills in R Graphics)
What: A C++ header-only template library for motif counting (and more.)
Why: Implement Discrete Exponential Family Models [DEFMs] for phylogenetics and social networks analysis.
Where: You can get it on GitHub (USCBiostats/barry)
About 11 K lines of C++ code built for statistical modeling:
Motif count using change statistics (we will return to that.)
Full and constrained enumeration of 0/1 arrays.
Computes probability function for Discrete Exponential-Family Models [DEFMs].
Memory and computationally efficient for pooled models.
Let’s look into the change statistics edgecount, triangles, and gender-homophily when we remove tie 33-69.
| s() | y- | y+ | change |
|---|---|---|---|
| Edgecount | 816 | 817 | 1 |
| Triangles | 5366 | 5399 | 33 |
| Group-homophily | 664 | 665 | 1 |
flowchart LR
start[Create the\nmodel] --> count[Add\ncounters]
count --> count_done{Done?}
count_done --> |Yes| const[Add\nconstraints]
count_done --> |No| count
const --> const_done{Done?}
const_done --> |Yes| init((End))
const_done --> |No| const
flowchart LR
start[Hash the\narray] --> exists{Present?}
exists --> |Yes| ptr[[Link\nPointer]]
exists --> |No| addarray[[Compute\nsupport]]
ptr --> End((End))
addarray --> End
In principle, any dataset we can represent as 0/1 arrays can be modeled with barry. We have applications in social networks, phylogenetics, and health outcomes.
The netplot R package for graph visualization.
barry: Your go-to motif accountant.
fmcmc | ergmito | aphylo | netdiffuseR | ABCoptim
slurmR | barry | rgexf | geese
Using rgexf (my most popular R package w/ 600K downloads)
Networks co-session network at INSNA’s Sunbelt 2022: Nodes are colored according to their roles: speaker, session chair, and session organizer.–(source: gvegayon/gallery)
The change statistic is defined as a real-valued vector where the \(k\)-th entry equals the observed change when the \(ij\)-th tie is removed from the network. Formally:
\[ \delta(y_{ij}: 0\to 1) = s(\mathbf{y})_{ij}^+ - s(\mathbf{y})_{ij}^- \]
Where \(s(\cdot)\) is a function returning graph \(\mathbf{y}\)’s observed statistics, and \(s(\mathbf{y})_{ij}^+\) represents its value when \(y_{ij} = 1\).
Furthermore, conditioning on the rest of the network (or array,) a tie transition probability equates a logit:
\[\begin{equation} \mbox{logit}\left({\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) }\right) = {\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right), \end{equation}\]
\[\begin{equation} {\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) } = \frac{1}{1 + \mbox{exp}\left\{-{\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right)\right\}} \end{equation}\]
george.vegayon at utah – https://ggvy.cl/slides/sci2023